Artifact reduction for solutions to inverse problems

ABSTRACT

A method includes determining, using a physics-based model and based on a plurality of observations, first solution data. The first solution data is descriptive of a first estimated solution to an inverse problem associated with the plurality of observations, and the first solution data includes artifacts due, at least in part, to a count of observations of the plurality of observations. The method also includes performing a plurality of iterations of a gradient descent artifact reduction process to generate second solution data. The artifacts are reduced in the second solution data relative to the first solution data. A particular iteration of the gradient descent artifact reduction process includes determining, using a machine-learning model, a value of a gradient metric associated with particular solution data and adjusting the particular solution data based on the value of the gradient metric to generate updated solution data.

CROSS-REFERENCE TO RELATED APPLICATIONS

The present application claims priority from U.S. Provisional Patent Application No. 63,255,322, entitled “ARTIFACT REDUCTION FOR SOLUTIONS TO INVERSE PROBLEMS”, filed Oct. 13, 2021, and claims priority from U.S. Provisional Patent Application No. 63,263,532, entitled “ARTIFACT REDUCTION FOR SOLUTIONS TO INVERSE PROBLEMS”, filed Nov. 4, 2021, and claims priority from U.S. Provisional Patent Application No. 63,362,787, entitled “DATA SELECTION FOR IMAGE GENERATION”, filed Apr. 11, 2022, and claims priority from U.S. Provisional Patent Application No. 63,362,789, entitled “IMAGE ARTIFACT REDUCTION USING FILTER DATA BASED ON DEEP IMAGE PRIOR OPERATIONS”, filed Apr. 11, 2022, and claims priority from U.S. Provisional Patent Application No. 63,362,792, entitled “RELIABILITY FOR MACHINE-LEARNING BASED IMAGE GENERATION”, filed Apr. 11, 2022, the contents of each of which are incorporated herein by reference in their entirety.

FIELD

The present disclosure is generally related to using a machine-learning model to facilitate reduction of artifacts in a solution to an inverse problem.

BACKGROUND

Conceptually, a “forward problem” attempts to determine or predict a set of observations based on a model of causal factors associated with a system and initial conditions of the system. An “inverse problem” reverses the forward problem by attempting to model causal factors and initial conditions based on a set of observations. Stated another way, an inverse problem starts with the effects (e.g., the observations) and attempts to determine model parameters, whereas the forward problem starts with the causes (e.g., a model of the system) and attempts to determine the effects. Inverse problems are used for many remote sensing applications, such as radar, sonar, medical imaging, computer vision, seismic imaging, etc.

Optimization techniques are commonly used to generate solutions to inverse problems. For example, with particular assumptions about a system that generated a set of return data (e.g., observations), a reverse time migration technique can be used to generate image data representing the system. However, images generated using such techniques generally include artifacts. As used herein, artifacts are distinct from noise. For example, artifacts may result from the inverse problem being ill-posed (e.g., having more than one possible solution based on the assumptions and the available observations), from modeling constraints, from inconsistent discretization of numerical schemes, and/or from modeling approximations, as a few examples. Such artifacts can be reduced by increasing the number of observations used to generate the solution; however generating more observations is costly and time consuming. Furthermore, the computing resources required to perform optimization increase dramatically as the number of observations increases.

SUMMARY

The present disclosure describes systems and methods that use machine-learning to facilitate reduction of artifacts in a solution to an inverse problem.

In some aspects, a method includes determining, using a physics-based model and based on a plurality of observations, first solution data. The first solution data is descriptive of a first estimated solution to an inverse problem associated with the plurality of observations, and the first solution data includes artifacts due, at least in part, to a count of observations of the plurality of observations. The method also includes performing a plurality of iterations of a gradient descent artifact reduction process to generate second solution data. The artifacts are reduced in the second solution data relative to the first solution data. A particular iteration of the gradient descent artifact reduction process includes determining, using a machine-learning model, a value of a gradient metric associated with particular solution data and adjusting the particular solution data based on the value of the gradient metric to generate updated solution data.

In some aspects, a system includes one or more processors configured to determine, using a physics-based model and based on a plurality of observations, first solution data. The first solution data is descriptive of a first estimated solution to an inverse problem associated with the plurality of observations, and the first solution data includes artifacts due, at least in part, to a count of observations of the plurality of observations. The one or more processors are further configured to perform a plurality of iterations of a gradient descent artifact reduction process to generate second solution data. The artifacts are reduced in the second solution data relative to the first solution data. A particular iteration of the gradient descent artifact reduction process includes determining, using a machine-learning model, a value of a gradient metric associated with particular solution data and adjusting the particular solution data based on the value of the gradient metric to generate updated solution data.

In some aspects, a computer-readable storage device stores instructions. The instructions, when executed by one or more processors, cause the one or more processors to determine, using a physics-based model and based on a plurality of observations, first solution data. The first solution data is descriptive of a first estimated solution to an inverse problem associated with the plurality of observations, and the first solution data includes artifacts due, at least in part, to a count of observations of the plurality of observations. The instructions, when executed by one or more processors, further cause the one or more processors to perform a plurality of iterations of a gradient descent artifact reduction process to generate second solution data. The artifacts are reduced in the second solution data relative to the first solution data. A particular iteration of the gradient descent artifact reduction process includes determining, using a machine-learning model, a value of a gradient metric associated with particular solution data and adjusting the particular solution data based on the value of the gradient metric to generate updated solution data.

In some aspects, a method includes obtaining a first batch of solution data. Each set of solution data of the first batch corresponds to a physics-based solution to an inverse problem, each set of solution data of the first batch is associated with a respective artifact level, and the first batch includes sets of solution data associated with different artifact levels. The method also includes generating training data based on the first batch. Training data associated with a particular artifact level is determined based on differences between a set of solution data associated with a lowest artifact level and a set of solution data associated with the particular artifact level. The method further includes training a score matching network using the training data.

In some aspects, a system includes one or more processors configured to obtain a first batch of solution data. Each set of solution data of the first batch corresponds to a physics-based solution to an inverse problem, each set of solution data of the first batch is associated with a respective artifact level, and the first batch includes sets of solution data associated with different artifact levels. The one or more processors are further configured to generate training data based on the first batch. Training data associated with a particular artifact level is determined based on differences between a set of solution data associated with a lowest artifact level and a set of solution data associated with the particular artifact level. The one or more processors are further configured to train a score matching network using the training data.

In some aspects, a computer-readable storage device stores instructions. The instructions, when executed by one or more processors, cause the one or more processors to obtain a first batch of solution data. Each set of solution data of the first batch corresponds to a physics-based solution to an inverse problem, each set of solution data of the first batch is associated with a respective artifact level, and the first batch includes sets of solution data associated with different artifact levels. The instructions, when executed by the one or more processors, further cause the one or more processors to generate training data based on the first batch. Training data associated with a particular artifact level is determined based on differences between a set of solution data associated with a lowest artifact level and a set of solution data associated with the particular artifact level. The instructions, when executed by the one or more processors, further cause the one or more processors to train a score matching network using the training data.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 illustrates an example of a computer system configured to use machine-learning to facilitate reduction of artifacts in a solution to an inverse problem.

FIG. 2 is a flow chart of an example of a method of reducing artifacts in solution data associated with an inverse problem.

FIG. 3 is a flow chart of an example of a method of training a machine-learning model of a gradient descent artifact reduction system.

FIG. 4 is a diagram illustrating particular aspects of determining parameters of a of a gradient descent artifact reduction system.

FIG. 5A is a diagram illustrating a solution generated via reverse time migration based on a large number of observations.

FIG. 5B is a diagram illustrating a solution generated via reverse time migration based on a much smaller number of observations than used for FIG. 5A.

FIG. 5C is a diagram illustrating a solution generated, based on the same number of observations as used for FIG. 5B, via reverse time migration and gradient descent artifact reduction, according to particular aspects disclosed herein.

FIG. 6A is a diagram illustrating a solution generated via reverse time migration based on a large number of observations.

FIG. 6B is a diagram illustrating a solution generated via reverse time migration based on a much smaller number of observations than used for FIG. 6A.

FIG. 6C is a diagram illustrating a solution generated, based on the same number of observations as used for FIG. 6B, via reverse time migration and gradient descent artifact reduction, according to particular aspects disclosed herein.

DETAILED DESCRIPTION

The present disclosure describes systems and methods that use machine-learning to facilitate reduction of artifacts in a solution to an inverse problem. According to a particular aspect, an artifact reduction process uses physics-based modelling (such as reverse time migration) and a gradient descent artifact reduction process (an iterative search using a machine-learning model, such as a score matching generative model). In some implementations, a solution generated by the physics-based modelling is used to initialize the gradient descent artifact reduction process. In some such implementations, a solution generated by the gradient descent artifact reduction process is further refined using the physics-based modelling. For example, the artifact reduction process may perform multiple iterations that include the physics-based modelling and the gradient descent artifact reduction process. In a particular implementation, the physics-based modelling includes reverse time migration operations based on a small number of observations, and the gradient descent artifact reduction process reduces an artifact level of a solution generated by the reverse time migration operations to a level associated with a much larger number of observations. To illustrate, the artifact reduction process may generate solutions with artifact levels comparable to artifact levels associated with reverse time migration operations using an order of magnitude or more observations than the artifact reduction process uses to generate the solutions. As a specific example, during testing based on simulated seismic sensing observations, the artifact reduction process was able to generate high quality images using fewer than 10 observations, whereas reverse time migration alone used more than 200 observations to generate images of comparable quality (in terms of visually detectable image artifacts).

Particular aspects of the present disclosure are described below with reference to the drawings. In the description, common features are designated by common reference numbers throughout the drawings. As used herein, various terminology is used for the purpose of describing particular implementations only and is not intended to be limiting. For example, the singular forms “a,” “an,” and “the” are intended to include the plural forms as well, unless the context clearly indicates otherwise. Further the terms “comprise,” “comprises,” and “comprising” may be used interchangeably with “include,” “includes,” or “including.” Additionally, the term “wherein” may be used interchangeably with “where.” As used herein, “exemplary” may indicate an example, an implementation, and/or an aspect, and should not be construed as limiting or as indicating a preference or a preferred implementation. As used herein, an ordinal term (e.g., “first,” “second,” “third,” etc.) used to modify an element, such as a structure, a component, an operation, etc., does not by itself indicate any priority or order of the element with respect to another element, but rather merely distinguishes the element from another element having a same name (but for use of the ordinal term). As used herein, the term “set” refers to a grouping of one or more elements, and the term “plurality” refers to multiple elements.

In the present disclosure, terms such as “determining,” “calculating,” “estimating,” “shifting,” “adjusting,” etc. may be used to describe how one or more operations are performed. Such terms are not to be construed as limiting and other techniques may be utilized to perform similar operations. Additionally, as referred to herein, “generating,” “calculating,” “estimating,” “using,” “selecting,” “accessing,” and “determining” may be used interchangeably. For example, “generating,” “calculating,” “estimating,” or “determining” a parameter (or a signal) may refer to actively generating, estimating, calculating, or determining the parameter (or the signal) or may refer to using, selecting, or accessing the parameter (or signal) that is already generated, such as by another component or device.

As used herein, “coupled” may include “communicatively coupled,” “electrically coupled,” or “physically coupled,” and may also (or alternatively) include any combinations thereof. Two devices (or components) may be coupled (e.g., communicatively coupled, electrically coupled, or physically coupled) directly or indirectly via one or more other devices, components, wires, buses, networks (e.g., a wired network, a wireless network, or a combination thereof), etc. Two devices (or components) that are electrically coupled may be included in the same device or in different devices and may be connected via electronics, one or more connectors, or inductive coupling, as illustrative, non-limiting examples. In some implementations, two devices (or components) that are communicatively coupled, such as in electrical communication, may send and receive electrical signals (digital signals or analog signals) directly or indirectly, such as via one or more wires, buses, networks, etc. As used herein, “directly coupled” may include two devices that are coupled (e.g., communicatively coupled, electrically coupled, or physically coupled) without intervening components.

As used herein, the term “machine learning” should be understood to have any of its usual and customary meanings within the fields of computers science and data science, such meanings including, for example, processes or techniques by which one or more computers can learn to perform some operation or function without being explicitly programmed to do so. As a typical example, machine learning can be used to enable one or more computers to analyze data to identify patterns in data and generate a result based on the analysis. For certain types of machine learning, the results that are generated include data that indicates an underlying structure or pattern of the data itself. Such techniques, for example, include so called “clustering” techniques, which identify clusters (e.g., groupings of data elements of the data).

For certain types of machine learning, the results that are generated include a data model (also referred to as a “machine-learning model” or simply a “model”). Typically, a model is generated using a first data set to facilitate analysis of a second data set. For example, a first portion of a large body of data may be used to generate a model that can be used to analyze the remaining portion of the large body of data. As another example, a set of historical data can be used to generate a model that can be used to analyze future data.

Since a model can be used to evaluate a set of data that is distinct from the data used to generate the model, the model can be viewed as a type of software (e.g., instructions, parameters, or both) that is automatically generated by the computer(s) during the machine learning process. As such, the model can be portable (e.g., can be generated at a first computer, and subsequently moved to a second computer for further training, for use, or both). Additionally, a model can be used in combination with one or more other models to perform a desired analysis. To illustrate, first data can be provided as input to a first model to generate first model output data, which can be provided (alone, with the first data, or with other data) as input to a second model to generate second model output data indicating a result of a desired analysis. Depending on the analysis and data involved, different combinations of models may be used to generate such results. In some examples, multiple models may provide model output that is input to a single model. In some examples, a single model provides model output to multiple models as input.

Examples of machine-learning models include, without limitation, perceptrons, neural networks, support vector machines, regression models, decision trees, Bayesian models, Boltzmann machines, adaptive neuro-fuzzy inference systems, as well as combinations, ensembles and variants of these and other types of models. Variants of neural networks include, for example and without limitation, prototypical networks, autoencoders, transformers, self-attention networks, convolutional neural networks, deep neural networks, deep belief networks, etc. Variants of decision trees include, for example and without limitation, random forests, boosted decision trees, etc.

Since machine-learning models are generated by computer(s) based on input data, machine-learning models can be discussed in terms of at least two distinct time windows—a creation/training phase and a runtime phase. During the creation/training phase, a model is created, trained, adapted, validated, or otherwise configured by the computer based on the input data (which in the creation/training phase, is generally referred to as “training data”). Note that the trained model corresponds to software that has been generated and/or refined during the creation/training phase to perform particular operations, such as classification, prediction, encoding, or other data analysis or data synthesis operations. During the runtime phase (or “inference” phase), the model is used to analyze input data to generate model output. The content of the model output depends on the type of model. For example, a model can be trained to perform classification tasks or regression tasks, as non-limiting examples. In some implementations, a model may be continuously, periodically, or occasionally updated, in which case training time and runtime may be interleaved or one version of the model can be used for inference while a copy is updated, after which the updated copy may be deployed for inference.

In some implementations, a previously generated model is trained (or re-trained) using a machine-learning technique. In this context, “training” refers to adapting the model or parameters of the model to a particular data set. Unless otherwise clear from the specific context, the term “training” as used herein includes “re-training” or refining a model for a specific data set. For example, training may include so called “transfer learning.” As described further below, in transfer learning a base model may be trained using a generic or typical data set, and the base model may be subsequently refined (e.g., re-trained or further trained) using a more specific data set.

A data set used during training is referred to as a “training data set” or simply “training data”. The data set may be labeled or unlabeled. “Labeled data” refers to data that has been assigned a categorical label indicating a group or category with which the data is associated, and “unlabeled data” refers to data that is not labeled. Typically, “supervised machine-learning processes” use labeled data to train a machine-learning model, and “unsupervised machine-learning processes” use unlabeled data to train a machine-learning model; however, it should be understood that a label associated with data is itself merely another data element that can be used in any appropriate machine-learning process. To illustrate, many clustering operations can operate using unlabeled data; however, such a clustering operation can use labeled data by ignoring labels assigned to data or by treating the labels the same as other data elements.

Machine-learning models can be initialized from scratch (e.g., by a user, such as a data scientist) or using a guided process (e.g., using a template or previously built model). Initializing the model includes specifying parameters and hyperparameters of the model. “Hyperparameters” are characteristics of a model that are not modified during training, and “parameters” of the model are characteristics of the model that are modified during training. The term “hyperparameters” may also be used to refer to parameters of the training process itself, such as a learning rate of the training process. In some examples, the hyperparameters of the model are specified based on the task the model is being created for, such as the type of data the model is to use, the goal of the model (e.g., classification, regression, anomaly detection), etc. The hyperparameters may also be specified based on other design goals associated with the model, such as a memory footprint limit, where and when the model is to be used, etc.

Model type and model architecture of a model illustrate a distinction between model generation and model training. The model type of a model, the model architecture of the model, or both, can be specified by a user or can be automatically determined by a computing device. However, neither the model type nor the model architecture of a particular model is changed during training of the particular model. Thus, the model type and model architecture are hyperparameters of the model and specifying the model type and model architecture is an aspect of model generation (rather than an aspect of model training). In this context, a “model type” refers to the specific type or sub-type of the machine-learning model. As noted above, examples of machine-learning model types include, without limitation, perceptrons, neural networks, support vector machines, regression models, decision trees, Bayesian models, Boltzmann machines, adaptive neuro-fuzzy inference systems, as well as combinations, ensembles and variants of these and other types of models. In this context, “model architecture” (or simply “architecture”) refers to the number and arrangement of model components, such as nodes or layers, of a model, and which model components provide data to or receive data from other model components. As a non-limiting example, the architecture of a neural network may be specified in terms of nodes and links. To illustrate, a neural network architecture may specify the number of nodes in an input layer of the neural network, the number of hidden layers of the neural network, the number of nodes in each hidden layer, the number of nodes of an output layer, and which nodes are connected to other nodes (e.g., to provide input or receive output). As another non-limiting example, the architecture of a neural network may be specified in terms of layers. To illustrate, the neural network architecture may specify the number and arrangement of specific types of functional layers, such as long-short-term memory (LSTM) layers, fully connected (FC) layers, convolution layers, etc. While the architecture of a neural network implicitly or explicitly describes links between nodes or layers, the architecture does not specify link weights. Rather, link weights are parameters of a model (rather than hyperparameters of the model) and are modified during training of the model.

In many implementations, a data scientist selects the model type before training begins. However, in some implementations, a user may specify one or more goals (e.g., classification or regression), and automated tools may select one or more model types that are compatible with the specified goal(s). In such implementations, more than one model type may be selected, and one or more models of each selected model type can be generated and trained. A best performing model (based on specified criteria) can be selected from among the models representing the various model types. Note that in this process, no particular model type is specified in advance by the user, yet the models are trained according to their respective model types. Thus, the model type of any particular model does not change during training.

Similarly, in some implementations, the model architecture is specified in advance (e.g., by a data scientist); whereas in other implementations, a process that both generates and trains a model is used. Generating (or generating and training) the model using one or more machine-learning techniques is referred to herein as “automated model building”. In one example of automated model building, an initial set of candidate models is selected or generated, and then one or more of the candidate models are trained and evaluated. In some implementations, after one or more rounds of changing hyperparameters and/or parameters of the candidate model(s), one or more of the candidate models may be selected for deployment (e.g., for use in a runtime phase).

Certain aspects of an automated model building process may be defined in advance (e.g., based on user settings, default values, or heuristic analysis of a training data set) and other aspects of the automated model building process may be determined using a randomized process. For example, the architectures of one or more models of the initial set of models can be determined randomly within predefined limits. As another example, a termination condition may be specified by the user or based on configurations settings. The termination condition indicates when the automated model building process should stop. To illustrate, a termination condition may indicate a maximum number of iterations of the automated model building process, in which case the automated model building process stops when an iteration counter reaches a specified value. As another illustrative example, a termination condition may indicate that the automated model building process should stop when a reliability metric associated with a particular model satisfies a threshold. As yet another illustrative example, a termination condition may indicate that the automated model building process should stop if a metric that indicates improvement of one or more models over time (e.g., between iterations) satisfies a threshold. In some implementations, multiple termination conditions, such as an iteration count condition, a time limit condition, and a rate of improvement condition can be specified, and the automated model building process can stop when one or more of these conditions is satisfied.

Another example of training a previously generated model is transfer learning. “Transfer learning” refers to initializing a model for a particular data set using a model that was trained using a different data set. For example, a “general purpose” model can be trained to detect anomalies in vibration data associated with a variety of types of rotary equipment, and the general-purpose model can be used as the starting point to train a model for one or more specific types of rotary equipment, such as a first model for generators and a second model for pumps. As another example, a general-purpose natural-language processing model can be trained using a large selection of natural-language text in one or more target languages. In this example, the general-purpose natural-language processing model can be used as a starting point to train one or more models for specific natural-language processing tasks, such as translation between two languages, question answering, or classifying the subject matter of documents. Often, transfer learning can converge to a useful model more quickly than building and training the model from scratch.

Training a model based on a training data set generally involves changing parameters of the model with a goal of causing the output of the model to have particular characteristics based on data input to the model. To distinguish from model generation operations, model training may be referred to herein as optimization or optimization training. In this context, “optimization” refers to improving a metric, and does not mean finding an ideal (e.g., global maximum or global minimum) value of the metric. Examples of optimization trainers include, without limitation, backpropagation trainers, derivative free optimizers (DFOs), and extreme learning machines (ELMs). As one example of training a model, during supervised training of a neural network, an input data sample is associated with a label. When the input data sample is provided to the model, the model generates output data, which is compared to the label associated with the input data sample to generate an error value. Parameters of the model are modified in an attempt to reduce (e.g., optimize) the error value. As another example of training a model, during unsupervised training of an autoencoder, a data sample is provided as input to the autoencoder, and the autoencoder reduces the dimensionality of the data sample (which is a lossy operation) and attempts to reconstruct the data sample as output data. In this example, the output data is compared to the input data sample to generate a reconstruction loss, and parameters of the autoencoder are modified in an attempt to reduce (e.g., optimize) the reconstruction loss.

As another example, to use supervised training to train a model to perform a classification task, each data element of a training data set may be labeled to indicate a category or categories to which the data element belongs. In this example, during the creation/training phase, data elements are input to the model being trained, and the model generates output indicating categories to which the model assigns the data elements. The category labels associated with the data elements are compared to the categories assigned by the model. The computer modifies the model until the model accurately and reliably (e.g., within some specified criteria) assigns the correct labels to the data elements. In this example, the model can subsequently be used (in a runtime phase) to receive unknown (e.g., unlabeled) data elements, and assign labels to the unknown data elements. In an unsupervised training scenario, the labels may be omitted. During the creation/training phase, model parameters may be tuned by the training algorithm in use such that the during the runtime phase, the model is configured to determine which of multiple unlabeled “clusters” an input data sample is most likely to belong to.

As another example, to train a model to perform a regression task, during the creation/training phase, one or more data elements of the training data are input to the model being trained, and the model generates output indicating a predicted value of one or more other data elements of the training data. The predicted values of the training data are compared to corresponding actual values of the training data, and the computer modifies the model until the model accurately and reliably (e.g., within some specified criteria) predicts values of the training data. In this example, the model can subsequently be used (in a runtime phase) to receive data elements and predict values that have not been received. To illustrate, the model can analyze time series data, in which case, the model can predict one or more future values of the time series based on one or more prior values of the time series.

In some aspects, the output of a model can be subjected to further analysis operations to generate a desired result. To illustrate, in response to particular input data, a classification model (e.g., a model trained to perform classification tasks) may generate output including an array of classification scores, such as one score per classification category that the model is trained to assign. Each score is indicative of a likelihood (based on the model's analysis) that the particular input data should be assigned to the respective category. In this illustrative example, the output of the model may be subjected to a softmax operation to convert the output to a probability distribution indicating, for each category label, a probability that the input data should be assigned the corresponding label. In some implementations, the probability distribution may be further processed to generate a one-hot encoded array. In other examples, other operations that retain one or more category labels and a likelihood value associated with each of the one or more category labels can be used.

FIG. 1 illustrates an example of a computer system 100 configured to use machine-learning to facilitate reduction of artifacts in a solution to an inverse problem according to particular implementations. For example, the computer system 100 is configured to initiate, perform, or control one or more of the operations described with reference to FIG. 2 or 3 . The computer system 100 can be implemented as or incorporated into one or more of various other devices, such as a personal computer (PC), a tablet PC, a server computer, a personal digital assistant (PDA), a laptop computer, a desktop computer, a communications device, a wireless telephone, or any other machine capable of executing a set of instructions (sequential or otherwise) that specify actions to be taken by that machine. Further, while a single computer system 100 is illustrated, the term “system” includes any collection of systems or sub-systems that individually or jointly execute a set, or multiple sets, of instructions to perform one or more computer functions.

While FIG. 1 illustrates one example of the computer system 100, other computer systems or computing architectures and configurations may be used for carrying out the artifact reduction operations disclosed herein. The computer system 100 includes the one or more processors 102. Each processor of the one or more processors 102 can include a single processing core or multiple processing cores that operate sequentially, in parallel, or sequentially at times and in parallel at other times. Each processor of the one or more processors 102 includes circuitry defining a plurality of logic circuits 104, working memory 106 (e.g., registers and cache memory), communication circuits, etc., which together enable the processor(s) 102 to control the operations performed by the computer system 100 and enable the processor(s) 102 to generate a useful result based on analysis of particular data and execution of specific instructions.

The processor(s) 102 are configured to interact with other components or subsystems of the computer system 100 via a bus 170. The bus 170 is illustrative of any interconnection scheme serving to link the subsystems of the computer system 100, external subsystems or devices, or any combination thereof. The bus 170 includes a plurality of conductors to facilitate communication of electrical and/or electromagnetic signals between the components or subsystems of the computer system 100. Additionally, the bus 170 includes one or more bus controllers or other circuits (e.g., transmitters and receivers) that manage signaling via the plurality of conductors and that cause signals sent via the plurality of conductors to conform to particular communication protocols.

In FIG. 1 , the computer system 100 includes one or more output devices 130, one or more input devices 110, and one or more interface devices 120. Each of the output device(s) 130, the input device(s) 110, and the interface device(s) 120 can be coupled to the bus 170 via a port or connector, such as a Universal Serial Bus port, a digital visual interface (DVI) port, a serial ATA (SATA) port, a small computer system interface (SCSI) port, a high-definition media interface (HDMI) port, or another serial or parallel port. In some implementations, one or more of the output device(s) 130, the input device(s) 110, the interface device(s) 120 is coupled to or integrated within a housing with the processor(s) 102 and the memory device(s) 140, in which case the connections to the bus 170 can be internal, such as via an expansion slot or other card-to-card connector. In other implementations, the processor(s) 102 and the memory device(s) 140 are integrated within a housing that includes one or more external ports, and one or more of the output device(s) 130, the input device(s) 110, the interface device(s) 120 is coupled to the bus 170 via the external port(s).

Examples of the output device(s) 130 include display devices, speakers, printers, televisions, projectors, or other devices to provide output of data (e.g., solution data representing a solution to an inverse problem) in a manner that is perceptible by a user. Examples of the input device(s) 110 include buttons, switches, knobs, a keyboard 112, a pointing device 114, a biometric device, a microphone, a motion sensor, or another device to detect user input actions. The pointing device 114 includes, for example, one or more of a mouse, a stylus, a track ball, a pen, a touch pad, a touch screen, a tablet, another device that is useful for interacting with a graphical user interface, or any combination thereof. A particular device may be an input device 110 and an output device 130. For example, the particular device may be a touch screen.

The interface device(s) 120 are configured to enable the computer system 100 to communicate with one or more other devices 124 directly or via one or more networks 122. For example, the interface device(s) 120 may encode data in electrical and/or electromagnetic signals that are transmitted to the other device(s) 124 as control signals or packet-based communication using pre-defined communication protocols. As another example, the interface device(s) 120 may receive and decode electrical and/or electromagnetic signals that are transmitted by the other device(s) 124. To illustrate, the other device(s) 124 may include observation sensor(s) that generate observation data 142. The electrical and/or electromagnetic signals can be transmitted wirelessly (e.g., via propagation through free space), via one or more wires, cables, optical fibers, or via a combination of wired and wireless transmission. The observation data 142 can include or correspond to waveform return data, such as data descriptive of seismic returns, acoustic returns, or electromagnetic returns.

The computer system 100 also includes the one or more memory devices 140. The memory device(s) 140 include any suitable computer-readable storage device depending on, for example, whether data access needs to be bi-directional or unidirectional, speed of data access required, memory capacity required, other factors related to data access, or any combination thereof. Generally, the memory device(s) 140 includes some combinations of volatile memory devices and non-volatile memory devices, though in some implementations, only one or the other may be present. Examples of volatile memory devices and circuits include registers, caches, latches, many types of random-access memory (RAM), such as dynamic random-access memory (DRAM), etc. Examples of non-volatile memory devices and circuits include hard disks, optical disks, flash memory, and certain types of RAM, such as resistive random-access memory (ReRAM). Other examples of both volatile and non-volatile memory devices can be used as well, or in the alternative, so long as such memory devices store information in a physical, tangible medium. Thus, the memory device(s) 140 include circuits and structures and are not merely signals or other transitory phenomena (i.e., are non-transitory media).

In the example illustrated in FIG. 1 , the memory device(s) 140 store instructions 144 that are executable by the processor(s) 102 to perform various operations and functions. The instructions 144 include instructions to enable the various components and subsystems of the computer system 100 to operate, interact with one another, and interact with a user, such as a basic input/output system (BIOS) 146 and an operating system (OS) 148. Additionally, the instructions 144 include one or more applications 150, scripts, or other program code to enable the processor(s) 102 to perform the operations described herein. For example, in FIG. 1 , the instructions 144 include model trainer instructions 152, which are executable by the processor(s) 102 to initiate, control, or perform one or more of the operations described with reference to FIG. 3 . Additionally, in the example of FIG. 1 , the instructions 144 include an inverse problem engine 154 that is configured to generate a solution to an inverse problem based on observation data 142. In the example illustrated in FIG. 1 , the inverse problem engine 154 includes a preprocessor 156, physics-based model instructions 158 and gradient descent artifact reduction instructions 160. Additionally, in the example illustrated in FIG. 1 , the gradient descent artifact reduction instructions 160 include a machine-learning model 162. As described further below, the physics-based model instructions 158 and the gradient descent artifact reduction instructions 160 are configured to interact (e.g., to exchange solution data) to generate a solution to an inverse problem. In a particular implementation, the solution to the inverse problem is represented as image data (e.g., a reflectivity image), which can be rendered by a rendering engine 164 to generate a graphical user interface (GUI) 132 for display via one of the output device(s) 130.

As one example, the inverse problem engine 154, when executed by the processor(s) 102 causes the processor(s) 102 to initiate, perform, or control an iterative process in which the physics-based model instructions 158 generate, based on the observations 142, first solution data that is descriptive of a first estimated solution to the inverse problem. In this example, the first solution data is provided as input to the gradient descent artifact reduction instructions 160, and one or more parameters of a gradient descent artifact reduction process of the gradient descent artifact reduction instructions 160 are initialized based on the first solution data. The gradient descent artifact reduction instructions 160 perform a plurality of iterations of the gradient descent artifact reduction process to generate second solution data. As a result of operations performed by the gradient descent artifact reduction process, artifacts in the second solution data are reduced relative to artifacts in the first solution data.

In a particular implementation, an iteration of the gradient descent artifact reduction process includes determining, using a machine-learning model 162, a value of a gradient metric associated with particular solution data (e.g., the first solution data or solution data generated by a prior iteration of the gradient descent artifact reduction process). The iteration also includes adjusting the particular solution data based on the value of the gradient metric to generate updated solution data.

As one example, the inverse problem engine 154 may perform operations as described by the following pseudocode:

x₀ ← RTM(k) for n iterations:  for m artifact levels:   α_(m) = ε*λ_(m)/λ_(min)   Optional or selective, regularize   for t=1,..T Langevin steps:    x_(t) ← x_(t−1) + α_(m) S(x_(t−1), λ_(m))  Optional or selective, for p shots:   Perform physics-based modeling to generate revised x_(t)

In the pseudocode above, RTM(k) represents a result of reverse time migration of k observations, where k is an integer greater than one. For example. RTM(8) refers to solution data based on reverse time migration of 8 of the observations 142. The solution data generated by RTM(k) is used as an initial estimate (x₀) of a solution to the inverse problem. Additionally, in the pseudocode above, n is a configurable counter having a value greater than or equal to one. Further, in the pseudocode above, m is a counter indicating a number of artifact levels over which annealed Langevin dynamic sampling is to be performed, where m is an integer greater than or equal to one and less than or equal to a count of the total number of observation 142 that are available. Generally, m is set within a range from a smallest number of observations (referred to as k_(min)) that can be used to generate acceptable solution data to a largest number of observations (referred to as k_(max)) that the inverse problem engine 154 is allowed to use. In a particular implementation, k_(min) is associated with a largest artifact level, λ_(max) (e.g., strongest artifacts in the solution data) used to train the machine-learning model 162, and k_(max) is associated with a smallest artifact level, λ_(min) (e.g., weakest artifacts in the solution data) used to train the machine-learning model 162.

Additionally, in the pseudocode above, α_(m) is a step size parameter used by the Langevin operations in the inner loop and is annealed (e.g., iteratively decreased) based on a ratio of the current artifact level, λ_(m), to the smallest artifact level, λ_(min), used to train the machine-learning model 162 adjusted by a configurable parameter ε.

Further, in the pseudocode above, an inner loop performs T iterations to modify the solution data x to decrease artifacts present in the solution data, where T is an integer greater than or equal to two. Generally, good results have been achieved with values of T on the order of 100 to 200. In each inner loop iteration, solution data x_(t) is determined by adjusting prior solution data x_(t-1) based on a gradient metric α_(m) S(x_(t-1), λ_(m)), where S(x_(t-1), λ_(m)) is an output of the machine-learning model 162 based on the prior solution data x_(t-1) and the current artifact level, λ_(m).

In some implementations, after T iterations of the inner loop to generate solution data x_(t), the solution data may be regularized. In a particular implementation, the solution data may be regularized based on total variation of solution data of the plurality of iterations. In some implementations, the regularization is used to conform the solution data to specified expectations, such as physical constraints of the observed system. In some implementations, regularization is optional. To illustrate, regularization can be omitted entirely in some implementations. In some implementations, regularization is selectively performed. To illustrate, regularization may be applied to particular solution data (and not applied to other solution data) based on characteristics of the solution data or based on output of one or more the operations performed by the pseudocode.

In some implementations, the pseudocode above also includes one or more iterations of the physics-based model instructions 158. For example, within the n iterations loop and after the T Langevin steps loop, the pseudocode may include performing one or more least mean square reverse time migration (LSRTM) iterations of p observations, where p is an integer greater than one. In some implementations, p is set equal to k. In some implementations, physics-based modelling after the T Langevin steps loop is optional. To illustrate, physics-based modelling can be omitted entirely in some implementations. In some implementations, physics-based modelling after the T Langevin steps loop is selectively performed. To illustrate, physics-based modelling may be performed based on characteristics of the solution data generated by the T Langevin steps loop or based on output of another operation performed by the pseudocode.

One benefit of using a gradient descent artifact reduction process based on the pseudocode above is that high-quality solutions (e.g., solutions with weaker or fewer artifacts) can be generated using fewer computing resources than would be used to generate similar high-quality solutions using reverse time migration alone. To illustrate, the images illustrated in FIGS. 5A-5C and 6A-6C show examples of results based on simulated seismic sensing. FIGS. 5A and 6A show images that were each generated using only reverse time migration based on 243 observations (commonly referred to as “shots” in seismic sensing). FIGS. 5B and 6B show images that were each generated using only reverse time migration based on 8 shots. Note that in FIG. 5B, significant visual artifacts are present in regions 502. Significant visual artifacts are also present in FIG. 6B in regions 602. FIGS. 5C and 6C show images that were each generated using reverse time migration based on 8 shots and a gradient descent artifact reduction process based on the pseudocode above. To generate FIGS. 5C and 6C, n was set to 1, m was set to 1 and T was set to 200, R was zeroed out, and no LSRTM iterations were performed after the T Langevin steps loop.

Comparison of FIG. 5B with FIG. 5C shows that the gradient descent artifact reduction process significantly reduced the number and/or visual strength of the artifacts present in FIG. 5C as compared to the artifacts present in FIG. 5B. Likewise, comparison of FIG. 6B with FIG. 6C shows that the gradient descent artifact reduction process significantly reduced the number and/or visual strength of the artifacts present in FIG. 6C as compared to the artifacts present in FIG. 6B. For many purposes, FIGS. 5C and 6C may be useful substitutes for FIGS. 5A and 6A; however, generation of FIGS. 5C and 6C used significantly fewer computing resources (e.g., power, processor cycles, memory) than generation of FIGS. 5A and 6A. Further, significant time and expense can be saved by generating only the 8 shots used for FIGS. 5C and 6C rather than the 243 shots as used for FIGS. 5A and 6A.

Returning to FIG. 1 , in a particular implementation, the model trainer 152 is executable by the processor(s) 102 to train the machine-learning model 162. Although FIG. 1 illustrates the model trainer 152 and the inverse problem engine 154 in the same computing device 100, in other implementations, the model trainer 152 and the inverse problem engine 154 may be executed at distinct computing devices. For example, the model trainer 152 may execute at one of the other device(s) 124 to train the machine-learning model 162, and the machine-learning model 162 (or an instance thereof) may subsequently be loaded to the computing device 100 for execution.

In a particular implementation, the machine-learning model 162 includes or corresponds to a score-matching network, and the model trainer 152 is configured to train the score-matching network. Training the score-matching network includes, for example, obtaining multiple sets of solution data based on a plurality of observations (e.g., the observations 142 of FIG. 1 ). In this example, each set of solution data corresponds to a physics-based solution to the inverse problem, and each set of solution data is associated with a respective artifact level. In general, the artifact level of a set of solution data is inversely proportional to the number of observations used to generate the set of solution data (e.g., via RTM). Multiple sets of training data are generated based on the multiple sets of solution data. For example, each set of training data is based on one or more sets of solution data associated with a respective artifact level. To illustrate, a set of training data associated with a particular artifact level is determined based on differences between a set of solution data associated with a lowest artifact level and a set of solution data with the particular artifact level.

The score matching network is trained using the multiple sets of training data. For example, the score matching network may be trained by adjusting parameters of the score matching network to decrease a value of an objective function. In this example, the value of the objective function represents a weighted sum of values of objective functions for multiple different artifact levels. To illustrate, the objective function for a particular artifact level λ may be represented by:

ℓ ⁡ ( θ ; λ ) = 1 2 p data ( x ) RTM ⁡ ( k ) [  s θ ( x ~ , λ ) + x RTM ⁡ ( K ) - ⁢ x RTM ⁡ ( k ) λ k 2  2 2 ]

In this example, the objective function used to train the score matching network may be represented by:

${\mathcal{L}\left( {\theta;\left\{ \lambda \right\}_{i = 1}^{L}} \right)} = {\frac{1}{L}{\sum_{i = 1}^{L}{{\gamma\left( \lambda_{k_{i}}^{2} \right)}{\ell\left( {\theta;\lambda_{i}} \right)}}}}$

In the objective function for a particular artifact level λ, p_(data(x)) refers to a probability density function of a data set x and

_(p) _(data) _((x)) represents an expectation over p_(data(x)) and

_(RTM(k)) represents an expectation over RTM(k). S_(θ)({tilde over (x)}, θ) represents solution data generated by the machine-learning model 162 for a particular artifact level λ. Further, x_(RTM(K)) represents solution data generated by a physics-based model (e.g., RTM) based on K observations, where K is a count of observations of the largest set of observations used for any solution in the training data, and x_(RTM(k)) represents solution data generated by the physics-based model (e.g., RTM) based on k observations, where k is an integer greater than one and less than K. Since larger sets of observations correspond to lower levels of artifacts,

$\frac{x_{{{RTM}(K)} -}x_{{RTM}(k)}}{\lambda_{k}^{2}}$

represents a value of an error metric that is based on a difference between solution data associated with the particular artifact level (corresponding to using k observations) and solution data associated with a lowest artifact level (corresponding to using K observations) of the multiple sets of solution data used to generate the training data.

In the objective function used to train the score matching network, L is the total count of artifact levels used, and γ is a function of a fitting parameter that is based on an error metric associated with the particular artifact level. In some implementations, γ is equal to λ².

In a particular implementation, values of λ for a particular count of observations can be determined by determining multiple RTM(k) solutions for the same value of k. For example, from among a large set of observation data, multiple subsets of k observations can be selected. To illustrate, in the example of seismic sampling, different source and/or receiver positions can be selected for different subsets of k observations. The RTM(k) values for a particular value of k can be compared to RTM(K) to determine a mean square error for the artifact level associated with k. The value of λ for the particular count of observations k can be determined by plotting the mean square error values with respect to a normalized count of observations (e.g., k_(min)/k) and fitting a line to the plotted points.

FIG. 4 is a diagram illustrating particular aspects of determining parameters of a gradient descent artifact reduction system. In particular, FIG. 4 illustrates a plot of data points with k_(min)/k on the x-axis and mean square error (MSE) of RTM(K)-RTM(k) on the y-axis. In FIG. 4 , lines 402, 404, 408, and 410 represent data for a particular slice (e.g., a two-dimensional visualization) of an observed system. For example, a line 402 connects maximum MSE(RTM(K)-RTM(k)) values for the slice, a line 410 connects minimum MSE(RTM(K)-RTM(k)) values for the slice, a line 404 connects values of a mean of a distribution of values of the MSE(RTM(K)-RTM(k)) for the slice, and a line 408 represents a curve fit to values connected by the line 404. A line 406 represents a curve fit based on averaging across multiple slices. The line 406 represents values of λ².

In an alternative embodiment, dedicated hardware implementations, such as application specific integrated circuits, programmable logic arrays and other hardware devices, can be constructed to implement one or more of the operations described herein. Accordingly, the present disclosure encompasses software, firmware, and hardware implementations.

FIG. 2 is a flow chart of an example of a method of reducing artifacts in solution data associated with an inverse problem. One or more operations described with reference to FIG. 2 may be performed by the computing device 100 of FIG. 1 , such as by the processor(s) 120 executing the instructions 144.

The method 200 includes, at 202, determining, using a physics-based model and based on a plurality of observations, first solution data. The first solution data is descriptive of a first estimated solution to an inverse problem associated with the plurality of observations. The first solution data includes artifacts due, at least in part, to a count of observations of the plurality of observations.

As a particular example, the observations may include waveform return data and determining the first solution data using the physics-based model may include performing one or more iterations of reverse time migration based on the plurality of observations. In this example, the first estimated solution to the inverse problem may include a reflectivity image. Examples of waveform return data include, without limitation, seismic returns, acoustic returns, or electromagnetic returns.

The method 200 also includes, at 204, performing a plurality of iterations of a gradient descent artifact reduction process (such as by execution of the gradient descent artifact reduction instructions 160 of FIG. 1 ) to generate second solution data. The artifacts are reduced in the second solution data relative to the first solution data. A particular iteration of the gradient descent artifact reduction process includes, at 206, determining, using a machine-learning model (e.g., a score-matching network, such as the machine learning model 162 of FIG. 1 ), a value of a gradient metric associated with particular solution data, and at 208, adjusting the particular solution data based on the value of the gradient metric to generate updated solution data.

In some implementations, the method 200 also includes, after determining the second solution data, providing the second solution data as input to the physics-based model to generate third solution data. For example, a result generated by the Langevin steps of the pseudocode for the inverse problem engine 154 may be subjected to one or more LSRTM iterations to further refine the solution. In some such implementations, the method 200 may further include, performing a second plurality of iterations of the gradient descent artifact reduction process to generate fourth solution data, where the artifacts are reduced in the fourth solution data relative to the third solution data.

In some implementations, during the particular iteration of the gradient descent artifact reduction process, the particular solution data is adjusted further based on a step size parameter (e.g., α_(m)). Further, the step size parameter may be adjusted after one or more iterations. For example, the gradient descent artifact reduction process may include, after performing the plurality of iterations, adjusting the step size parameter and performing a second plurality of iterations using the adjusted step size parameter. In a particular aspect, the step size parameter is based on a ratio of an error metric associated with the count of observations and an error metric associated with a specified minimum count of observations.

In some implementations, the particular solution data generated via one or more iterations is adjusted further based on one or more regularization terms. In some implementations, the regularization term(s) are based, at least in part, on a total variation of solution data of the plurality of iterations of the gradient descent artifact reduction process. In the same or different implementations, one or more of the regularization term(s) is selected to enforce particular constraints. For example, a regularization term may be applied to enforce expected features of the solution data (e.g., an image) based on prior knowledge or assumptions about the observed system. As another example, a regularization term may be applied to enforce physics-based or experience-based expectations, such as an arrangement of features in the solution data.

One benefit of the method 200 is that it facilitates generation of high-quality solutions (e.g., solutions with weaker or fewer artifacts) using fewer computing resources than would be used to generate similar high-quality solutions using RTM alone. Further, the method 200 uses fewer observations than would be used to generate similar high-quality solutions using RTM alone. As a result, time and resources expended to gather observations can be reduced.

FIG. 3 is a flow chart of an example of a method of training a machine-learning model of a gradient descent artifact reduction system. One or more operations described with reference to FIG. 3 may be performed by the computing device 100 of FIG. 1 , such as by the processor(s) 120 executing the instructions 144.

The method 300 includes, at 302, obtaining a first batch of solution data. Each set of solution data of the first batch corresponds to a physics-based solution to an inverse problem and is associated with a respective artifact level. For example, the sets of solution data may be determined using reverse time migration (RTM). As one specific example, RTM is performed for each observation of a plurality of observations that are available to be processed to generate RTM data. In some implementations, the plurality of observations that are available to be processed may be selected from among a larger set of observations.

In this example, a first set of solution data may include RTM data based on k₁ observations, where k₁ is an integer that is greater than one and less than a total count of observations that are available for processing. Further, in this example, a second set of solution data may include RTM data based on k₂ observations, where k₂ is an integer that is greater than k₁ and less than a total count of observations that are available for processing. Similarly, other sets of solution data may include RTM data for other numbers of observations. In some implementations, a randomized process is used to select the specific observations (from among the set of observations available for processing) used to determine a particular set of solution data. In the same or different implementations, a randomized process is used to select a count of observations (e.g., a k value) used to determine a particular set of solution data.

The method 300 includes, at 304, generating training data based on the first batch. The training data associated with a particular artifact level is determined based on differences between a set of solution data associated with a lowest artifact level and a set of solution data associated with the particular artifact level.

The method 300 includes, at 306, training a score matching network using the training data. For example, training the score matching network may include adjusting parameters of the score matching network to decrease a value of an objective function, where the value of the objective function represents a weighted sum of values of objective functions for multiple different artifact levels. In this example, a value of an objective function for a particular artifact level of the multiple different artifact levels is weighted based on a fitting parameter, and the fitting parameter is based on an error metric associated with the particular artifact level. To illustrate, a value of the error metric may be determined based on a difference between solution data associated with the particular artifact level and solution data associated with a lowest artifact level of the multiple sets of solution data.

In some implementations, the score matching network may be further trained based on one or more additional batches of training data. For example, the method 300 may include obtaining one or more second batches of solution data corresponding to physics-based solutions to the inverse problem, generating additional training data based on the one or more second batches, and training the score matching network using the additional training data.

The systems and methods illustrated herein may be described in terms of functional block components, screen shots, optional selections and various processing steps. It should be appreciated that such functional blocks may be realized by any number of hardware and/or software components configured to perform the specified functions. For example, the system may employ various integrated circuit components, e.g., memory elements, processing elements, logic elements, look-up tables, and the like, which may carry out a variety of functions under the control of one or more microprocessors or other control devices. Similarly, the software elements of the system may be implemented with any programming or scripting language such as C, C++, C #, Java, JavaScript, VBScript, Macromedia Cold Fusion, COBOL, Microsoft Active Server Pages, assembly, PERL, PHP, AWK, Python, Visual Basic, SQL Stored Procedures, PL/SQL, any UNIX shell script, and extensible markup language (XML) with the various algorithms being implemented with any combination of data structures, objects, processes, routines or other programming elements. Further, it should be noted that the system may employ any number of techniques for data transmission, signaling, data processing, network control, and the like.

The systems and methods of the present disclosure may be embodied as a customization of an existing system, an add-on product, a processing apparatus executing upgraded software, a standalone system, a distributed system, a method, a data processing system, a device for data processing, and/or a computer program product. Accordingly, any portion of the system or a module or a decision model may take the form of a processing apparatus executing code, an internet based (e.g., cloud computing) embodiment, an entirely hardware embodiment, or an embodiment combining aspects of the internet, software and hardware. Furthermore, the system may take the form of a computer program product on a computer-readable storage medium or device having computer-readable program code (e.g., instructions) embodied or stored in the storage medium or device. Any suitable computer-readable storage medium or device may be utilized, including hard disks, CD-ROM, optical storage devices, magnetic storage devices, and/or other storage media. As used herein, a “computer-readable storage medium” or “computer-readable storage device” is not a signal.

Systems and methods may be described herein with reference to screen shots, block diagrams and flowchart illustrations of methods, apparatuses (e.g., systems), and computer media according to various aspects. It will be understood that each functional block of a block diagrams and flowchart illustration, and combinations of functional blocks in block diagrams and flowchart illustrations, respectively, can be implemented by computer program instructions.

Computer program instructions may be loaded onto a computer or other programmable data processing apparatus to produce a machine, such that the instructions that execute on the computer or other programmable data processing apparatus create means for implementing the functions specified in the flowchart block or blocks. These computer program instructions may also be stored in a computer-readable memory or device that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart block or blocks. The computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer-implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart block or blocks.

Accordingly, functional blocks of the block diagrams and flowchart illustrations support combinations of means for performing the specified functions, combinations of steps for performing the specified functions, and program instruction means for performing the specified functions. It will also be understood that each functional block of the block diagrams and flowchart illustrations, and combinations of functional blocks in the block diagrams and flowchart illustrations, can be implemented by either special purpose hardware-based computer systems which perform the specified functions or steps, or suitable combinations of special purpose hardware and computer instructions.

Particular aspects of the disclosure are described below in the following Examples:

According to Example 1, a method includes: determining, using a physics-based model and based on a plurality of observations, first solution data, the first solution data descriptive of a first estimated solution to an inverse problem associated with the plurality of observations, wherein the first solution data includes artifacts due, at least in part, to a count of observations of the plurality of observations; and performing a plurality of iterations of a gradient descent artifact reduction process to generate second solution data, wherein the artifacts are reduced in the second solution data relative to the first solution data, and wherein a particular iteration of the gradient descent artifact reduction process includes: determining, using a machine-learning model, a value of a gradient metric associated with particular solution data; and adjusting the particular solution data based on the value of the gradient metric to generate updated solution data.

Example 2 includes the method of Example 1, wherein determining the first solution data using the physics-based model includes performing reverse time migration based on at least a subset of the plurality of observations.

Example 3 includes the method of Example 1 or Example 2, wherein the plurality of observations includes waveform return data.

Example 4 includes the method of Example 3, wherein the waveform return data includes seismic returns, acoustic returns, or electromagnetic returns.

Example 5 includes the method of any of Examples 1 to 4, wherein the first estimated solution to the inverse problem includes a reflectivity image.

Example 6 includes the method of any of Examples 1 to 5, further including, after determining the second solution data, providing the second solution data as input to the physics-based model to generate third solution data.

Example 7 includes the method of Example 6, further including performing a second plurality of iterations of the gradient descent artifact reduction process to generate fourth solution data, wherein the artifacts are reduced in the fourth solution data relative to the third solution data.

Example 8 includes the method of any of Examples 1 to 7, wherein, during the particular iteration, the particular solution data is adjusted further based on a step size parameter.

Example 9 includes the method of Example 8, further including, after performing the plurality of iterations: adjusting the step size parameter; and performing a second plurality of iterations.

Example 10 includes the method of Example 8 or Example 9, wherein the step size parameter is based on a ratio of an error metric associated with the count of observations and an error metric associated with a specified minimum count of observations.

Example 11 includes the method of any of Examples 1 to 10, wherein, during the particular iteration, the particular solution data is adjusted further based on a regularization term that is based on total variation of solution data of the plurality of iterations of the gradient descent artifact reduction process.

Example 12 includes the method of any of Examples 1 to 11, wherein the machine-learning model corresponds to a score-matching network.

According to Example 13, a system includes one or more processors configured to: determine, using a physics-based model and based on a plurality of observations, first solution data, the first solution data descriptive of a first estimated solution to an inverse problem associated with the plurality of observations, wherein the first solution data includes artifacts due, at least in part, to a count of observations of the plurality of observations; and perform a plurality of iterations of a gradient descent artifact reduction process to generate second solution data, wherein the artifacts are reduced in the second solution data relative to the first solution data, and wherein a particular iteration of the gradient descent artifact reduction process includes: determining, using a machine-learning model, a value of a gradient metric associated with particular solution data; and adjusting the particular solution data based on the value of the gradient metric to generate updated solution data.

Example 14 includes the system of Example 13, wherein determining the first solution data using the physics-based model includes performing reverse time migration based on at least a subset of the plurality of observations.

Example 15 includes the system of Example 13 or Example 14, wherein the plurality of observations includes waveform return data.

Example 16 includes the system of Example 15, wherein the waveform return data includes seismic returns, acoustic returns, or electromagnetic returns.

Example 17 includes the system of any of Examples 13 to 16, wherein the first estimated solution to the inverse problem includes a reflectivity image.

Example 18 includes the system of any of Examples 13 to 17?, wherein the one or more processors are further configured to, after determining the second solution data, provide the second solution data as input to the physics-based model to generate third solution data.

Example 19 includes the system of Example 18, wherein the one or more processors are further configured to perform a second plurality of iterations of the gradient descent artifact reduction process to generate fourth solution data, wherein the artifacts are reduced in the fourth solution data relative to the third solution data.

Example 20 includes the system of any of Examples 13 to 18?, wherein the one or more processors are configured to, during the particular iteration, adjust the particular solution data based on a step size parameter.

Example 21 includes the system of Example 20, wherein the one or more processors are further configured to, after performing the plurality of iterations: adjust the step size parameter; and perform a second plurality of iterations.

Example 22 includes the system of Example 20 or Example 21, wherein the step size parameter is based on a ratio of an error metric associated with the count of observations and an error metric associated with a specified minimum count of observations.

Example 23 includes the system of any of Examples 13 to 22?, wherein, during the particular iteration, the one or more processors are configured to adjust the particular solution data based on a regularization term that is based on total variation of solution data of the plurality of iterations of the gradient descent artifact reduction process.

Example 24 includes the system of any of Examples 13 to 23, wherein the machine-learning model corresponds to a score-matching network.

According to Example 25, a computer-readable storage device stores instructions that, when executed by one or more processors, cause the one or more processors to: determine, using a physics-based model and based on a plurality of observations, first solution data, the first solution data descriptive of a first estimated solution to an inverse problem associated with the plurality of observations, wherein the first solution data includes artifacts due, at least in part, to a count of observations of the plurality of observations; and perform a plurality of iterations of a gradient descent artifact reduction process to generate second solution data, wherein the artifacts are reduced in the second solution data relative to the first solution data, and wherein a particular iteration of the gradient descent artifact reduction process includes: determining, using a machine-learning model, a value of a gradient metric associated with particular solution data; and adjusting the particular solution data based on the value of the gradient metric to generate updated solution data.

Example 26 includes the computer-readable storage device of Example 25, wherein determining the first solution data using the physics-based model includes performing reverse time migration based on at least a subset of the plurality of observations.

Example 27 includes the computer-readable storage device of Example 25 or Example 26, wherein the plurality of observations includes waveform return data.

Example 28 includes the computer-readable storage device of Example 27, wherein the waveform return data includes seismic returns, acoustic returns, or electromagnetic returns.

Example 29 includes the computer-readable storage device of any of Examples 25 to 28, wherein the first estimated solution to the inverse problem includes a reflectivity image.

Example 30 includes the computer-readable storage device of any of Examples 25 to 29, wherein the instructions, when executed by the one or more processors, further cause the one or more processors to, after determining the second solution data, provide the second solution data as input to the physics-based model to generate third solution data.

Example 31 includes the computer-readable storage device of Example 30, wherein the instructions, when executed by the one or more processors, further cause the one or more processors to perform a second plurality of iterations of the gradient descent artifact reduction process to generate fourth solution data, wherein the artifacts are reduced in the fourth solution data relative to the third solution data.

Example 32 includes the computer-readable storage device of any of Examples 25 to 31, wherein the instructions, when executed by the one or more processors, further cause the one or more processors to, during the particular iteration, adjust the particular solution data further based on a step size parameter.

Example 33 includes the computer-readable storage device of Example 32, wherein the instructions, when executed by the one or more processors, further cause the one or more processors to, after performing the plurality of iterations: adjust the step size parameter; and perform a second plurality of iterations.

Example 34 includes the computer-readable storage device of Example 32 or Example 33, wherein the step size parameter is based on a ratio of an error metric associated with the count of observations and an error metric associated with a specified minimum count of observations.

Example 35 includes the computer-readable storage device of any of Examples 25 to 34, wherein, during the particular iteration, the particular solution data is adjusted further based on a regularization term that is based on total variation of solution data of the plurality of iterations of the gradient descent artifact reduction process.

Example 36 includes the computer-readable storage device of any of Examples 25 to 35, wherein the machine-learning model corresponds to a score-matching network.

According to Example 37, a method includes: obtaining a first batch of solution data, each set of solution data of the first batch corresponding to a physics-based solution to an inverse problem, wherein each set of solution data of the first batch is associated with a respective artifact level, and wherein the first batch includes sets of solution data associated with different artifact levels; generating training data based on the first batch, wherein training data associated with a particular artifact level is determined based on differences between a set of solution data associated with a lowest artifact level and a set of solution data associated with the particular artifact level; and training a score matching network using the training data.

Example 38 includes the method of Example 37, wherein training the score matching network includes adjusting parameters of the score matching network to decrease a value of an objective function, and wherein the value of the objective function represents a weighted sum of values of objective functions for multiple different artifact levels.

Example 39 includes the method of Example 37 or Example 38, wherein a value of an objective function for the particular artifact level is weighted based on a fitting parameter, the fitting parameter based on an error metric associated with the particular artifact level.

Example 40 includes the method of Example 39, wherein a value of the error metric is determined based on a difference between solution data associated with the particular artifact level and solution data associated with a lowest artifact level of multiple sets of solution data.

Example 41 includes the method of any of Examples 37 to 40, wherein obtaining the first batch of solution data includes: performing reverse time migration (RTM) for each observation of a plurality of observations to generate RTM data; combining a first subset of the RTM data to generate a first set of solution data of the first batch, wherein the first subset of the RTM data corresponds to k1 observations, where k1 is an integer greater than one and less than a total count of observations of the plurality of observations; and combining a second subset of the RTM data to generate a second set of solution data of the first batch, wherein the second subset of the RTM data corresponds to k2 observations, where k2 is an integer greater than k1 and less than the total count of observations of the plurality of observations.

Example 42 includes the method of Example 41, wherein the plurality of observations includes waveform return data.

Example 43 includes the method of Example 42, wherein the waveform return data includes seismic returns, acoustic returns, or electromagnetic returns.

Example 44 includes the method of any of Examples 41 to 43, wherein the each set of solution data represents a reflectivity image based on a subset of observations of the plurality of observations.

Example 45 includes the method of any of Examples 37 to 44, further including: obtaining one or more second batches of solution data corresponding to physics-based solutions to the inverse problem; generating additional training data based on the one or more second batches; and training the score matching network using the additional training data.

According to Example 46, a system includes one or more processors configured to: obtain a first batch of solution data, each set of solution data of the first batch corresponding to a physics-based solution to an inverse problem, wherein each set of solution data of the first batch is associated with a respective artifact level, and wherein the first batch includes sets of solution data associated with different artifact levels; generate training data based on the first batch, wherein training data associated with a particular artifact level is determined based on differences between a set of solution data associated with a lowest artifact level and a set of solution data associated with the particular artifact level; and train a score matching network using the training data.

Example 47 includes the system of Example 46, wherein training the score matching network includes adjusting parameters of the score matching network to decrease a value of an objective function, wherein the value of the objective function represents a weighted sum of values of objective functions for multiple different artifact levels.

Example 48 includes the system of Example 47, wherein a value of an objective function for the particular artifact level is weighted based on a fitting parameter, the fitting parameter based on an error metric associated with the particular artifact level.

Example 49 includes the system of Example 48, wherein a value of the error metric is determined based on a difference between solution data associated with the particular artifact level and solution data associated with a lowest artifact level of multiple sets of solution data.

Example 50 includes the system of any of Examples 46 to 49, wherein obtaining the first batch of solution data includes: performing reverse time migration (RTM) for each observation of a plurality of observations to generate RTM data; combining a first subset of the RTM data to generate a first set of solution data of the first batch, wherein the first subset of the RTM data corresponds to k1 observations, where k1 is an integer greater than one and less than a total count of observations of the plurality of observations; and combining a second subset of the RTM data to generate a second set of solution data of the first batch, wherein the second subset of the RTM data corresponds to k2 observations, where k2 is an integer greater than k1 and less than the total count of observations of the plurality of observations.

Example 51 includes the system of Example 50, wherein the plurality of observations includes waveform return data.

Example 52 includes the system of Example 51, wherein the waveform return data includes seismic returns, acoustic returns, or electromagnetic returns.

Example 53 includes the system of any of Examples 50 to 52, wherein the each set of solution data represents a reflectivity image based on a subset of observations of the plurality of observations.

Example 54 includes the system of any of Examples 46 to 53, wherein the one or more processors are further configured to: obtain one or more second batches of solution data corresponding to physics-based solutions to the inverse problem; generate additional training data based on the one or more second batches; and train the score matching network using the additional training data.

According to Example 55, a computer-readable storage device stores instructions that, when executed by one or more processors, cause the one or more processors to: obtain a first batch of solution data, each set of solution data of the first batch corresponding to a physics-based solution to an inverse problem, each set of solution data of the first batch associated with a respective artifact level, and the first batch including sets of solution data associated with different artifact levels; generate training data based on the first batch, wherein training data associated with a particular artifact level is determined based on differences between a set of solution data associated with a lowest artifact level and a set of solution data associated with the particular artifact level; and train a score matching network using the training data.

Example 56 includes the computer-readable storage device of Example 55, wherein training the score matching network includes adjusting parameters of the score matching network to decrease a value of an objective function, wherein the value of the objective function represents a weighted sum of values of objective functions for multiple different artifact levels.

Example 57 includes the computer-readable storage device of Example 56, wherein a value of an objective function for the particular artifact level is weighted based on a fitting parameter, the fitting parameter based on an error metric associated with the particular artifact level.

Example 58 includes the computer-readable storage device of Example 57, wherein a value of the error metric is determined based on a difference between solution data associated with the particular artifact level and solution data associated with a lowest artifact level of multiple sets of solution data.

Example 59 includes the computer-readable storage device of any of Examples 55 to 58, wherein obtaining the first batch of solution data includes: performing reverse time migration (RTM) for each observation of a plurality of observations to generate RTM data; combining a first subset of the RTM data to generate a first set of solution data of the first batch, wherein the first subset of the RTM data corresponds to k1 observations, where k1 is an integer greater than one and less than a total count of observations of the plurality of observations; and combining a second subset of the RTM data to generate a second set of solution data of the first batch, wherein the second subset of the RTM data corresponds to k2 observations, where k2 is an integer greater than k1 and less than the total count of observations of the plurality of observations.

Example 60 includes the computer-readable storage device of Example 59, wherein the plurality of observations includes waveform return data.

Example 61 includes the computer-readable storage device of Example 60, wherein the waveform return data includes seismic returns, acoustic returns, or electromagnetic returns.

Example 62 includes the computer-readable storage device of any of Examples 59 to 61, wherein the each set of solution data represents a reflectivity image based on a subset of observations of the plurality of observations.

Example 63 includes the computer-readable storage device of any of Examples 55 to 62, wherein the instructions, when executed by the one or more processors, further cause the one or more processors to: obtain one or more second batches of solution data corresponding to physics-based solutions to the inverse problem; generate additional training data based on the one or more second batches; and train the score matching network using the additional training data.

Although the disclosure may include one or more methods, it is contemplated that it may be embodied as computer program instructions on a tangible computer-readable medium, such as a magnetic or optical memory or a magnetic or optical disk/disc. All structural, chemical, and functional equivalents to the elements of the above-described exemplary embodiments that are known to those of ordinary skill in the art are expressly incorporated herein by reference and are intended to be encompassed by the present claims. Moreover, it is not necessary for a device or method to address each and every problem sought to be solved by the present disclosure, for it to be encompassed by the present claims. Furthermore, no element, component, or method step in the present disclosure is intended to be dedicated to the public regardless of whether the element, component, or method step is explicitly recited in the claims. As used herein, the terms “comprises,” “comprising,” or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.

Changes and modifications may be made to the disclosed embodiments without departing from the scope of the present disclosure. These and other changes or modifications are intended to be included within the scope of the present disclosure, as expressed in the following claims. 

What is claimed is:
 1. A method comprising: determining, using a physics-based model and based on a plurality of observations, first solution data, the first solution data descriptive of a first estimated solution to an inverse problem associated with the plurality of observations, wherein the first solution data includes artifacts due, at least in part, to a count of observations of the plurality of observations; and performing a plurality of iterations of a gradient descent artifact reduction process to generate second solution data, wherein the artifacts are reduced in the second solution data relative to the first solution data, and wherein a particular iteration of the gradient descent artifact reduction process includes: determining, using a machine-learning model, a value of a gradient metric associated with particular solution data; and adjusting the particular solution data based on the value of the gradient metric to generate updated solution data.
 2. The method of claim 1, wherein determining the first solution data using the physics-based model comprises performing reverse time migration based on at least a subset of the plurality of observations.
 3. The method of claim 1, wherein the plurality of observations comprises waveform return data.
 4. The method of claim 3, wherein the waveform return data includes seismic returns, acoustic returns, or electromagnetic returns.
 5. The method of claim 1, wherein the first estimated solution to the inverse problem comprises a reflectivity image.
 6. The method of claim 1, further comprising, after determining the second solution data, providing the second solution data as input to the physics-based model to generate third solution data.
 7. The method of claim 6, further comprising performing a second plurality of iterations of the gradient descent artifact reduction process to generate fourth solution data, wherein the artifacts are reduced in the fourth solution data relative to the third solution data.
 8. The method of claim 1, wherein, during the particular iteration, the particular solution data is adjusted further based on a step size parameter.
 9. The method of claim 8, further comprising, after performing the plurality of iterations: adjusting the step size parameter; and performing a second plurality of iterations.
 10. The method of claim 8, wherein the step size parameter is based on a ratio of an error metric associated with the count of observations and an error metric associated with a specified minimum count of observations.
 11. The method of claim 1, wherein, during the particular iteration, the particular solution data is adjusted further based on a regularization term that is based on total variation of solution data of the plurality of iterations of the gradient descent artifact reduction process.
 12. The method of claim 1, wherein the machine-learning model corresponds to a score-matching network.
 13. A system comprising: one or more processors configured to: determine, using a physics-based model and based on a plurality of observations, first solution data, the first solution data descriptive of a first estimated solution to an inverse problem associated with the plurality of observations, wherein the first solution data includes artifacts due, at least in part, to a count of observations of the plurality of observations; and perform a plurality of iterations of a gradient descent artifact reduction process to generate second solution data, wherein the artifacts are reduced in the second solution data relative to the first solution data, and wherein a particular iteration of the gradient descent artifact reduction process includes: determining, using a machine-learning model, a value of a gradient metric associated with particular solution data; and adjusting the particular solution data based on the value of the gradient metric to generate updated solution data.
 14. The system of claim 13, wherein determining the first solution data using the physics-based model comprises performing reverse time migration based on at least a subset of the plurality of observations.
 15. The system of claim 13, wherein the plurality of observations comprises waveform return data, wherein the waveform return data includes seismic returns, acoustic returns, or electromagnetic returns, and wherein the first estimated solution to the inverse problem comprises a reflectivity image.
 16. The system of claim 13, wherein the one or more processors are further configured to, after determining the second solution data, provide the second solution data as input to the physics-based model to generate third solution data.
 17. The system of claim 16, wherein the one or more processors are further configured to perform a second plurality of iterations of the gradient descent artifact reduction process to generate fourth solution data, wherein the artifacts are reduced in the fourth solution data relative to the third solution data.
 18. The system of claim 13, wherein the one or more processors are further configured to: during the particular iteration, adjust the particular solution data further based on a step size parameter; and after performing the plurality of iterations: adjust the step size parameter; and perform a second plurality of iterations.
 19. The system of claim 13, wherein, during the particular iteration, the particular solution data is adjusted further based on a term that is based on total variation of solution data of the plurality of iterations of the gradient descent artifact reduction process.
 20. A computer-readable storage device storing instructions that, when executed by one or more processors, cause the one or more processors to: determine, using a physics-based model and based on a plurality of observations, first solution data, the first solution data descriptive of a first estimated solution to an inverse problem associated with the plurality of observations, wherein the first solution data includes artifacts due, at least in part, to a count of observations of the plurality of observations; and perform a plurality of iterations of a gradient descent artifact reduction process to generate second solution data, wherein the artifacts are reduced in the second solution data relative to the first solution data, and wherein a particular iteration of the gradient descent artifact reduction process includes: determining, using a machine-learning model, a value of a gradient metric associated with particular solution data; and adjusting the particular solution data based on the value of the gradient metric to generate updated solution data. 